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Abstract 

Dynamic  reconfiguration  in  digital  systems  provides  valuable  flexibility  and 
opportunities  for  enhanced  efficiency,  but  also  leads  to  increased  complexity  in 
terms  of  design  analysis  and  optimization.  Existing  approaches  focus  primarily  on 
either  abstract  models  with  the  capability  of  expressing  dynamic  reconfiguration  at 
a  high  level  or  techniques  for  low-level,  platform- specific  implementation.  While 
both  of  these  areas  of  advancement  are  important,  there  is  an  increasing  need  to 
bridge  the  gap  between  them  in  order  to  better  realize  the  potential  of  dynamic 
reconfiguration  technology. 

In  this  paper,  we  provide  background  on  relevant  methods  for  application  mod¬ 
eling,  and  platform-based  implementation  of  dynamically  reconfigurable  signal 
processing  systems.  To  help  bridge  the  gap  between  these  groups  of  methods, 
we  develop  new  methods  based  on  an  application  modeling  formalism  called  pa¬ 
rameterized  dataflow,  along  with  techniques  for  mapping  parameterized  dataflow 
specifications  onto  FPGA  architectures. 

Our  proposed  parameterized  dataflow  approach  for  design  and  implementa¬ 
tion  of  dynamically  reconfigurable  signal  processing  systems  provides  a  compre¬ 
hensive  framework  that  encompasses  application  modeling,  task  scheduling,  and 
hardware  mapping.  We  demonstrate  our  methods  using  case  studies  in  the  do¬ 
mains  of  wireless  communication  and  computer  vision. 
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1.  Introduction 

As  the  speed  and  logic  capacity  of  field  programmable  gate  arrays  (FPGAs) 
have  been  improving  steadily,  FPGAs  have  become  increasingly  attractive  for  a 
wide  variety  of  signal  processing  systems.  FPGAs  are  increasingly  employed 
in  the  form  of  platform  FPGAs,  which  are  integrated  circuits  that  combine  sig¬ 
nificant  amounts  of  configurable  logic  fabric  along  with  additional  subsystems, 
such  as  application-specific  accelerators,  processor  cores,  memory  blocks,  and 
input/output  interfaces,  to  facilitate  FPGA-based,  system-on-chip  design  [54]. 
FPGA  fabric  is  also  integrated  into  application  specific  integrated  circuits  (ASICs) 
to  allow  implementations  that  provide  a  mix  of  programmable  and  custom  hard¬ 
ware  (e.g.,  see  [60]). 

Through  support  for  dynamic  reconfiguration,  modem  FPGAs  allow  customiza¬ 
tion  of  hardware  stmctures  both  statically  and  at  run-time,  thus  allowing  stream¬ 
lining  of  processing  configurations  in  response  to  application  requirements  or  data 
characteristics  that  are  not  known  at  design  time.  In  addition  to  allowing  for  dy¬ 
namic  changes  in  system  functionality,  dynamic  reconfiguration,  when  carried  out 
effectively,  can  enhance  performance,  resource  utilization,  and  energy  efficiency 
(e.g.,  see  [22]). 

However,  in  addition  to  such  potential  for  improved  operation,  incorporat¬ 
ing  dynamic  reconfiguration  into  the  digital  system  design  space  also  brings  in¬ 
creased  design  complexity.  Model-based  design  methodologies  have  been  evolv¬ 
ing  steadily  over  the  years  to  help  address  issues  of  design  complexity  in  em¬ 
bedded  systems  [28].  In  model-based  design,  applications  are  represented  and 
analyzed  in  terms  of  formal  models  of  computation,  which  promote  analysis  of 
functionality  as  well  as  hardware  and  software  stmcture  at  a  high  level  of  ab¬ 
straction.  In  the  domain  of  signal  processing,  model-based  techniques  based  on 
dataflow  models  of  computation  are  particularly  popular,  and  are  employed  in  a 
growing  variety  of  design  tools  [7]. 

While  dataflow  techniques  allow  for  high  level  reasoning  about  and  manipula¬ 
tion  of  application  dynamics,  there  are  important  challenges  in  mapping  dataflow 
models  into  FPGA  platforms  in  ways  that  systematically  and  effectively  exploit 
the  dynamic  reconfiguration  capabilities  of  the  platforms.  This  paper  provides  a 
review  of  state-of-the-art  model-based  design  techniques  and  FPGA  implemen- 
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tation  techniques  for  signal  processing  systems,  and  explores  the  challenges  in¬ 
volved  in  effectively  mapping  high  level  application  models  into  efficient  imple¬ 
mentations  on  dynamically  reconfigurable  FPGA  platforms. 

The  exploration  presented  in  this  paper  on  mapping  models  into  implemen¬ 
tations  builds  on  our  earlier  work  in  this  area,  which  was  presented  in  prelimi¬ 
nary  form  in  [55].  The  reconfiguration-aware  mapping  techniques  presented  in 
this  paper  (Sections  4-5)  go  beyond  the  developments  of  [55]  in  a  number  of 
ways.  Specifically,  this  extended  paper  enhances  the  hardware  architecture  map¬ 
ping  methodology  of  [55]  and  provides  two  alternative  perspectives  on  schedul¬ 
ing.  These  two  perspectives  affect  important  trade-offs  between  performance  and 
modularity.  An  important  new  aspect  integrated  into  one  of  these  scheduling  per¬ 
spectives  involves  integration  of  the  recently-developed  dataflow  schedule  graph 
model  [57]  into  processes  for  FPGA  mapping  of  dynamically  reconfigurable  sig¬ 
nal  processing  systems. 

2.  Background 

2.1.  FPGA  Technology 

As  shown  in  Figure  1,  an  FPGA  can  be  viewed  as  a  matrix  of  cells,  which  can 
encapsulate  various  kinds  of  hardware  structures,  such  as  programmable  logic 
subsystems,  memories,  special  purpose  hardware  subsystems  (e.g.,  multipliers  or 
higher  level  signal  processing  accelerators),  and  embedded  processor  cores.  A 
ring  of  FO  (input/output)  modules  is  placed  along  the  periphery  for  connection  to 
the  outside  world,  and  routing  channels,  driven  through  configurable  switches,  are 
used  for  communication  among  cells. 

Although  the  details  of  how  programmable  logic  subsystems  are  constructed 
varies  among  different  vendors,  their  basic  structure,  illustrated  in  Figure  2(a), 
comprises  M-input  look  up  tables  (LUTs)  and  D  flip-flops  (DFFs),  which  are  in¬ 
tegrated  into  programmable  logic  blocks.  Collections  of  such  logic  blocks  can  be 
programmed  to  implement  digital  logic  functions  of  arbitrary  complexity.  Fig¬ 
ure  2(b)  illustrates,  in  simplified  form,  the  structure  of  logic  blocks  that  are  em¬ 
ployed  in  FPGA  devices  made  by  Xilinx  and  Altera.  These  blocks  contain  sub¬ 
structures  for  arithmetic  and  carry  logic  to  support  the  flexible  construction  of 
computational  building  blocks. 

To  connect  logic  blocks,  flexible  routing  architectures  are  provided  to  accom¬ 
modate  different  kinds  of  interconnection  patterns.  The  island-style  routing  ar¬ 
chitecture  is  widely  adopted  in  commercial  FPGA  devices.  An  island-style  global 
routing  architecture  involves  routing  channels  on  all  four  sides  of  the  logic  blocks. 
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routing  channel 


L:  logic  M:  memory  E:  embedded  unit 
Figure  1 ;  Basic  architecture  of  an  FPGA. 

The  numbers  of  wires  contained  in  a  channel  is  set  to  a  constant  W  during  fab¬ 
rication,  and  is  one  of  the  key  design  decisions  made  in  the  FPGA  architecture 
design.  Island-style  routing  architectures  generally  employ  wire  segments  of  dif¬ 
ferent  lengths  in  each  channel  to  allow  optimized  selection  of  lengths  based  on  the 
specific  connections  that  are  made.  The  end  points  of  wire  segments  are  typically 
staggered  so  that  logic  blocks  can  be  connected  at  the  end  points  of  wires  that 
have  appropriate  lengths. 

Figure  3  illustrates  channels  and  switches  in  an  island-style  FPGA  architec¬ 
ture,  and  also  shows  a  routing  example.  The  figure  shows  an  interconnection  that 
starts  from  logic  block  5,  and  goes  through  logic  block  4  to  logic  block  2.  Two 
switch  nodes,  si  and  s2,  are  used.  To  connect  logic  block  5  to  logic  block  4, 
programmable  switch  D  in  switch  node  2  is  configured.  Similarly,  programmable 
switch  B  in  switch  node  1  is  configured  for  the  connection  between  logic  block  4 
and  logic  block  2. 

There  are  six  possible  connections  for  a  programmable  switch,  as  illustrated 
in  Figure  4,  which  is  why  six  switches  (A  through  F)  are  depicted  in  Figure  3. 

The  island-style  architecture  facilitates  optimization  of  the  physical  layout  of 
logic  blocks  and  their  surrounding  routing  channels,  and  its  regularity  facilitates 
efficient  delay  estimation  (e.g.,  see  [33]). 
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output 


(a)  Primitive  structures  for  constructing  programmable  logic 
blocks. 


(b)  A  simplified  illustration  of  logic  blocks  employed  in 
FPGA  devices  from  Xilinx  and  Altera. 


Figure  2:  Programmable  logic  blocks. 


2.2.  Model-based  Design  of  Signal  Processing  Systems 

As  described  in  Section  1,  model-based  techniques  based  on  dataflow  mod¬ 
els  of  computation  are  used  widely  in  the  design  of  signal  processing  systems. 
Dataflow  can  be  viewed  as  a  special  case  of  Kahn  process  networks  (KPNs),  which 
are  composed  of  processes  that  communicate  through  unbounded  communication 
channels  [21].  Each  communication  channel  is  a  first- in  first-out  (FIFO)  queue 
of  tokens,  where  tokens  encapsulate  data  values  as  they  pass  between  processes. 
FIFO  communication  channels  in  KPNs  can  be  written  to  and  read  only  from  the 
processes  that  are  connected  to  them  in  the  enclosing  KPN.  An  important  restric¬ 
tion  on  execution  of  KPNs  is  that  processes  can  access  data  from  incident  FIFOs 
only  through  blocking  read  operations,  which  effectively  suspend  the  processes 
if  requested  data  is  not  available,  and  allow  the  processes  to  resume  only  after 
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wire  segment 


the  associate  read  operations  are  completed.  This  restriction  helps  to  ensure  de- 
terminacy  of  KPN-based  representations.  Variations  on  Kahn  process  networks 
have  been  explored  in  recent  years,  such  as  polyhedral  process  networks  and  re¬ 
active  process  networks,  which  provide  useful  new  capabilities  for  modeling  and 
analysis  of  process  and  network  execution  [52,  36,  12]. 

In  the  context  of  design  and  implementation  of  signal  processing  systems, 
dataflow  is  a  restricted  form  of  KPN  in  which  processes  (called  actors  in  the  con¬ 
text  of  dataflow  representations)  execute  in  terms  of  well-defined,  discrete  units  of 
execution,  cdlXed  firings,  of  the  associated  actors  [27].  Such  a  discrete  approach 
to  process  firing  can  be  enforced,  for  example,  through  enable-invoke  semantics, 
where  fireability  (availability  of  sufficient  input  data)  is  fully  separated  from  pro¬ 
cess  execution  (invoking)  functionality  [43]. 

Model-based  design  methods  based  on  dataflow  models  of  computation  have 
provided  designers  of  signal  processing  systems  with  representations  of  intuitive 
correspondence  to  signal  processing  block  diagrams.  In  such  representations, 
DSP  applications  are  modeled  as  directed  graphs,  where  vertices  correspond  to 
dataflow  actors  and  represent  computational  modules  for  executing  (ox  firing)  the 
corresponding  tasks,  and  edges  represent  inter-actor  FIFO  channels,  as  in  KPNs. 
Actors  produce  and  consume  tokens  from  their  input  and  output  edges,  respec¬ 
tively,  as  they  are  fired.  In  enable-invoke  dataflow,  an  actor  firing  cannot  begin 
execution  until  all  of  the  input  data  required  by  the  firing  is  present  on  the  relevant 
input  edges,  and  thus  blocking  reads  are  not  employed  [43]. 

Scheduling  is  a  critical  issue  when  implementing  dataflow  representations  of 
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E 
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Figure  4:  Six  possible  connections  for  a  programmable  switch. 

signal  processing  systems.  Scheduling  is  the  proeess  of  eonstructing  a  schedule, 
whieh  assigns  eaeh  aetor  firing  to  a  proeessing  resouree,  and  determines  the  or¬ 
dering  of  firings  that  share  the  same  resouree.  In  addition  to  affeeting  overall 
system  funetionality  (a  sehedule  that  is  not  eonstrueted  properly  ean  eause  devia¬ 
tions  from  expeeted  funetionality),  scheduling  generally  has  signifieant  effect  on 
performance,  resouree  utilization,  and  memory  requirements. 

Synchronous  dataflow  (SDF),  proposed  in  [26],  is  a  restrieted  form  of  dataflow 
in  which  each  actor  firing  consumes  and  produces  eonstant  amounts  of  data  from 
eaeh  of  its  input  and  output  edges,  respeetively.  The  SDF  restriction  is  orthogonal 
to  enable-invoke  dataflow.  In  an  enable-invoke  context,  an  SDF  firing  ean  execute 
only  after  all  of  the  required  input  data  has  arrived  at  the  aetor  inputs.  On  the 
other  hand,  if  enable-invoke  semantics  is  not  enforced,  then  even  though  the  firing 
eonsumes  eonstant  amounts  of  data,  it  ean  begin  exeeution  when  only  part  of  its 
input  data  is  available,  and  employ  bloeking  reads  to  read  the  rest  of  the  data  — 
in  a  manner  that  is  interleaved  with  the  eomputations  associated  with  the  firing 
—  until  the  firing  is  complete.  An  important  advantage  of  SDF  is  its  support  for 
statie  seheduling,  and  a  wide  variety  of  scheduling  teehniques  have  evolved  that 
exploit  this  support  (e.g.,  see  [26,  14,  13,  39,  20,  49]). 

Cyclo-statie  dataflow  (CSDF)  [9]  is  one  of  the  most  popular  extensions  of 
SDF.  CSDF  generalizes  SDF  by  allowing  the  consumption  or  production  rate  of 
an  actor  port  to  vary  as  long  as  the  pattern  of  variations  forms  a  periodic  sequence 
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that  is  statically  known. 

An  example  of  an  SDF  actor  is  illustrated  in  Figure  5(a).  Here,  actor  D  rep¬ 
resents  a  downsampler  that  consumes  three  tokens  generated  from  actor  A  and 
transfers  one  among  them  onto  the  edge  that  is  directed  to  actor  B.  An  example 
of  a  schedule  for  this  graph  is  Si  =  {3A)DB.  This  schedule  is  expressed  in  looped 
schedule  notation,  where  each  parenthesized  term  represents  a  schedule  loop  that 
is  iterated  a  number  of  times  specified  by  a  positive  integer  iteration  count  that  is 
given  as  the  first  item  in  the  term  [8].  Thus,  the  schedule  Si  corresponds  to  the 
firing  sequence  (A,  A,  A,  D,  B). 

A  CSDF  version  of  this  downsampler-based  example  is  shown  in  Figure  5(b). 
Here,  firings  of  actor  D  execute  based  on  a  periodic  sequence  of  three  distinct 
phases,  and  the  inputs  and  outputs  of  the  actor  are  annotated  with  the  numbers  of 
tokens  consumed  and  produced  in  these  phases.  Thus,  D  consumes  one  token  on 
each  phase  and  produces  one  token  on  every  third  phase,  starting  with  the  second 
phase.  This  CSDF  graph  can  be  executed  with  the  schedule  S2  =  ADADBAD, 
which  differs  from  Si  in  that  less  buffer  space  (one  unit  of  storage  versus  three)  is 
needed  for  the  edge  {A,  D).  Indeed,  the  potential  for  improved  memory  require¬ 
ments  is  a  useful  feature  of  the  CSDF  model.  A  detailed  comparison  of  SDF  and 
CSDF  is  developed  in  [41].  In  addition,  CSDF  can  be  integrated  with  parameter¬ 
ized  dataflow  (this  will  be  discussed  in  the  following  section),  and  the  resulting 
model  is  called  parameterized  cyclo-static  dataflow  (PCSDF).  Such  integration, 
explored  in  [15,  45],  provides  further  flexibility  in  the  modeling  of  application 
dynamics  compared  to  PSDF 


schedule:  (3 A)  DB 


(a)  SDF  specification  and  a  corresponding 
schedule. 


schedule:  ADADBAD 
(b)  CSDF  specification  and  a  corresponding 
schedule. 


Figure  5:  Alternative  dataflow  models  for  a  simple  multirate  signal  processing  example. 
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2.3.  Parameterized  Dataflow 

Parameterized  dataflow  is  a  meta-modeling  teehnique  that  integrates  dynamie 
parameterization  into  dataflow  modeling  in  a  systematie  and  general  (applieable 
to  different  dataflow  models)  manner. 

In  parameterized  dataflow,  sets  of  aetor  parameters,  domains  in  which  such 
parameters  can  “reside”,  and  configurations  for  such  sets  (bindings  to  actual  val¬ 
ues  within  the  respective  parameter  domains)  are  important  aspects  of  application 
models  in  addition  to  more  conventional  forms  of  dataflow  information,  such  as 
specifications  on  token  production  and  consumption  rates  associated  with  actor 
firings. 

Settings  or  updates  to  parameter  configurations  can  be  distinguished  as  hap¬ 
pening  statically,  pre-execution,  or  dynamically.  Static  configurations  are  deter¬ 
mined  at  compile  time  —  i.e.,  when  the  code  for  an  implementation  is  derived. 
Pre-execution  configuration  refers  to  configuration  that  occurs  after  compile  time 
but  before  a  given  execution  of  the  application.  In  dynamic  configuration  or  dy¬ 
namic  reconfiguration,  parameter  values  can  be  initialized  or  updated  while  the 
application  is  executing. 

For  example,  in  an  FPGA  implementation,  an  addition  component  can  be  con¬ 
figured  by  instantiating  a  look-ahead  adder  or  a  ripple  carry  adder,  depending  on 
a  trade-off  assessment  in  terms  of  relevant  area  and  performance  constraints,  and 
such  an  assessment-configuration  sequence  can  in  general  be  carried  out  statically, 
pre-execution  or  dynamically  depending  on  factors  such  as  the  overhead  of  per¬ 
forming  the  configuration,  and  the  degree  to  which  this  overhead  is  amortized  by 
the  benefits  of  applying  the  selected  configuration. 

In  this  paper,  we  focus  on  methods  for  dynamic  reconfiguration,  which  are 
increasingly  important  in  the  development  of  signal  processing  systems  that  can 
adapt  in  response  to  dynamically  changing  application  constraints,  data  character¬ 
istics,  and  other  operating  conditions  [7]. 

The  organization  of  the  rest  of  this  paper  is  summarized  as  follows.  We  first 
review  hardware  methods  for  dynamic  reconfiguration.  We  then  show  how  param¬ 
eterized  dataflow  techniques  integrated  with  the  SDF  model  of  computation  can 
be  applied  as  an  abstract  model  for  design  and  implementation  of  dynamically 
reconfigurable  signal  processing  systems.  This  integrated  model,  called  param¬ 
eterized  synchronous  dataflow  (PSDF),  has  been  studied  in  a  variety  of  useful 
design  contexts  before  (e.g.,  see  [3,  7]);  in  this  paper,  we  present  novel  meth¬ 
ods  for  applying  this  model  to  the  systematic  hardware  mapping  of  dynamically 
reconfigurable  signal  processing  systems. 
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Next,  to  represent  adaptive  schedules  efficiently,  we  apply  a  schedule  rep¬ 
resentation  called  the  dataflow  schedule  graph  (DSG),  which  provides  a  formal 
approach  for  representing  dynamic  interactions  between  applications  and  the  ar¬ 
chitectures  on  which  they  execute.  We  show  that  using  the  DSG  model,  hardware 
mapping  of  PSDF  can  be  interpreted  naturally,  and  a  structured  path  to  efficient 
implementation  can  be  achieved.  Using  our  methods  based  on  the  parameter¬ 
ized  dataflow,  SDF  and  DSG  representations,  conventional,  ad-hoc  methods  for 
dynamic  reconfiguration  can  be  replaced  by  formally-rooted,  model-based  tech¬ 
niques  that  promote  efficiency,  reliable  integration,  and  modularity  through  a  sys¬ 
tematic,  dataflow-based  design  flow. 

3.  Dynamic  Reconfiguration  Techniques  in  FPGAs 

FPGAs  are  widely  employed  as  signal  processing  platforms  for  both  rapid 
prototyping  and  optimized  implementation.  Because  they  allow  customization  of 
digital  hardware  structures  with  significantly  higher  flexibility,  lower  cost,  and 
lower  turnaround  time,  they  provide  a  valuable  alternative  to  ASIC  implemen¬ 
tations  when  key  application  subsystems  can  be  mapped  efficiently  into  FPGA 
structures. 

Hybrid  architectures  that  integrate  FPGA  fabric  within  an  ASIC  have  been 
explored  to  provide  a  wider  range  of  trade-offs  between  efficiency  and  flexibility 
(e.g.,  see  [60]).  When  applying  such  a  hybrid  approach,  key  challenges  include 
the  partitioning  of  designs  into  ASIC  and  FPGA  parts,  and  integrating  the  ASIC 
and  FPGA  design  flows.  For  example,  the  ASIC/FPGA  partition  determines  the 
size  (area)  of  the  required  FPGA  subsystem,  which  in  turns  affects  the  FPGA 
placement.  Furthermore,  timing  characteristics  between  the  boundaries  of  the 
FPGA  and  custom  logic  subsystems  complicate  timing  closure.  For  more  details 
on  such  challenges  and  proposed  solutions,  we  refer  the  reader  to  [24]. 

Using  modem  FPGAs,  designers  can  exploit  high  speed  partial  reconflgu- 
ration  technology  (e.g.,  reconfiguration  involving  relatively  small  subsets  of  the 
blocks  shown  in  Figure  I)  to  incorporate  dynamic  hardware  reconfiguration  into 
practical  implementations.  Our  approach  to  applying  dynamic  reconfiguration 
involves  careful  selection  of  FPGA  components  that  will  be  reconfigured  at  run 
time.  The  reconfigured  circuits  must  satisfy  the  given  area  and  timing  constraints. 
That  is,  the  FPGA  logic  block  and  routing  channel  stmctures  illustrated  in  Figure  2 
and  Figure  3  need  to  be  taken  into  account  when  determining  the  circuits  that  will 
be  loaded  in  and  out  during  run-time  reconfiguration.  While  FPGA  design  flows 
for  static  configuration  perform  a  full  configuration  that  can  program  the  entire 
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target  FPGA,  partial  reconfiguration  preserves  the  programming  in  some  regions 
of  the  FPGA,  and  makes  changes  to  other  regions. 

Commercial  tools  are  available  that  provide  support  for  dynamic  reconfigura¬ 
tion.  For  example,  PlanAhead,  a  tool  from  Xilinx,  allows  users  to  specify  regions 
of  a  target  FPGA  that  can  be  configured  dynamically  [59].  Communication  be¬ 
tween  such  regions  and  statically-  or  pre-execution-configured  regions  relies  on 
the  insertion  of  certain  kinds  of  bus  macros.  In  terms  of  the  design  hierarchy, 
regions  set  up  for  dynamic  reconfiguration  must  be  represented  through  top-level 
design  modules. 

Software  programmable  reconfiguration  is  a  methodology  that  provides  dy¬ 
namic  reconfiguration  capabilities  for  FPGAs  in  a  more  software-oriented  way  — 
i.e.,  through  interfacing  with  a  soft-core  processor  [1].  Such  dynamic  reconfigura¬ 
tion  is  achieved,  in  a  manner  analogous  to  software  context  switching,  by  chang¬ 
ing  routing  paths  at  run-time  through  control  of  the  associated  soft-core  processor. 

JBits  SDK,  another  tool  developed  by  Xilinx,  contains  a  set  of  software  mod¬ 
ules  and  application  programming  interfaces  to  create  Xilinx  Virtex  bitstreams 
from  Java  code  [58].  Two  tools  for  partial  and  remote  reconfiguration  that  apply 
JBits  SDK  are  presented  in  [37].  One  is  the  circuit  customization  tool,  which 
helps  to  create  an  interface  to  configure  parameters  of  a  circuit.  The  other  is 
the  core  unifier  tool,  which  allows  designers  to  manipulate  connections  between 
cores  using  partial  reconfiguration.  When  applying  the  JBits  SDK,  there  are  two 
types  of  cores  called  controller  and  slave  cores.  Controller  cores  are  downloaded 
statically  on  to  an  FPGA,  and  can  communicate  with  slave  cores,  which  can  be 
loaded  dynamically.  During  execution,  controllers  can  switch  between  slave  cores 
or  replace  existing  slave  cores  by  downloading  new  ones. 

To  facilitate  simulation  of  designs  that  employ  dynamic  reconfiguration,  sim¬ 
ulation  techniques  developed  in  [50]  have  been  integrated  into  traditional  FPGA 
design  flow.  This  allows  dynamic  reconfiguration  capabilities  to  be  integrated  into 
simulations  that  employ  existing  simulation  tools,  such  as  NCVerilog  and  Mod- 
elSim.  In  such  simulations,  circuits  that  are  to  be  reconfigured  {reconfiguration 
circuits)  need  to  be  specified  and  encapsulated  up  front.  Such  design  capture  for 
reconfiguration  circuits  helps  to  specify  a  reconfiguration  schedule,  which  defines 
the  conditions  under  which  each  reconfiguration  circuit  in  the  system  is  loaded 
or  replaced.  Specialized  components,  such  as  isolation  switches  and  schedule 
control  modules,  are  typically  applied  for  the  encapsulation,  and  reconfiguration, 
respectively. 

Building  on  platform-based  tools  for  dynamic  reconfiguration,  various  design 
approaches  have  been  developed  for  FPGA  system  design.  For  example,  meth- 
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ods  for  real-time  and  power-aware  implementation  are  developed  in  [2].  The 
application  of  dynamic  instruction  sets  in  soft-core  microprocessors  is  explored 
in  [53].  In  this  approach,  if  an  FPGA  cannot  accommodate  all  of  the  relevant 
instructions  with  their  associated  hardware  support,  support  for  selected  instruc¬ 
tions  can  be  configured  dynamically  as  needed  by  the  application.  In  [31,  32],  a 
design  methodology  called  dynamic  hardware/software  partitioning  is  proposed 
in  which  FPGAs  are  employed  as  coprocessors  to  accelerate  computationally- 
intensive  loops.  In  [48,  18],  an  architecture  is  developed  that  contains  a  processor, 
bit-level  reconfigurable  part  (similar  to  conventional  FPGA  fabric),  and  compo¬ 
nents  (tiles)  for  composing  a  novel  form  of  reconfigurable  subsystem  called  di  field 
programmable  function  array  (FPFA).  Such  tiles  can  be  viewed  as  word-level,  re¬ 
configurable  building  blocks  that  are  composed  of  ALUs  and  lookup  tables.  An 
FPFA  is  constructed  from  FPFA  tiles  to  accelerate  intensive,  regularly-structured 
computations,  such  as  linear  interpolation  or  fast  Fourier  transforms.  A  class  of 
heterogeneous  processing  arrays  that  integrate  signal  processors  and  FPGA  sub¬ 
systems,  and  are  amenable  to  dataflow-based  design  and  mapping  techniques,  is 
explored  in  [56]. 

Development  of  applications  on  platform  FPGAs,  FPGAs  that  employ  soft 
core  processors,  and  host-FPGA  combinations  often  involves  hardware/software 
co-design  as  a  key  aspect.  Sophisticated  algorithms  have  been  developed  for 
optimization  of  timing,  area  and  energy  in  HW/SW  systems.  For  example,  the 
work  presented  in  [47]  takes  as  input  a  library  containing  general  purpose  pro¬ 
cessors,  dynamically  reconfigurable  FPGAs,  communication  links,  and  memory 
modules,  and  applies  an  evolutionary  algorithm  to  instantiate  hardware  resources, 
and  assign  tasks  and  communication  events  to  the  resources.  A  dynamic  priority 
multirate  scheduling  algorithm  determines  the  times  at  which  the  tasks  and  com¬ 
munication  events  in  the  system  occur.  A  framework  called  Nimble  is  presented 
in  [30].  This  framework  takes  as  input  application  specifications  in  C,  and  maps 
them  into  implementations  on  a  heterogeneous  platform  that  includes  a  general- 
purpose  processor,  an  FPGA,  and  a  memory  hierarchy.  An  overview  of  various 
hardware/software  co-synthesis  approaches  for  signal  processing  systems  is  pre¬ 
sented  in  [6]. 

Various  methods  have  also  been  developed  to  help  minimize  overhead  associ¬ 
ated  with  dynamic  reconfiguration.  For  example,  in  [16],  methods  are  developed 
for  evaluating  the  degree  of  computation-reconfiguration  overlap  in  dynamically 
reconfigurable  systems.  These  methods  are  based  on  modeling  of  the  dynamic 
reconfiguration  process,  and  identification  of  functional  commonality  between  re¬ 
configuration  circuits.  In  another  approach,  an  incremental  elaboration  model  is 
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applied  to  streamline  requests  for  new  reconfiguration  operations  by  using  set  the¬ 
oretic  techniques  to  leverage  known  characteristics  of  existing  (currently  active) 
configurations  [11]. 

In  this  section,  we  have  provided  a  brief  overview  of  platform-based  technolo¬ 
gies  and  tools  for  dynamic  reconfiguration  in  FPGAs.  For  more  details  on  design 
flows  for  dynamically  reconfigurable  FPGA  system  implementation,  we  refer  the 
reader  to  [35]  and  [44].  Methods  for  consistency  analysis  of  dynamic  reconfigura¬ 
tion  functionality  in  dataflow  graphs  are  discussed  in  [5,  38].  For  comprehensive 
reviews  of  FPGA  technology  and  system  design  methods,  we  refer  the  reader 
to  [54,  46]. 

4.  Modeling  Dynamic  Reconfiguration  using  PSDF  Techniques 

In  the  remainder  of  this  paper,  we  develop  methods  for  systematic  mapping 
of  model-based  signal  processing  application  representations  into  efficient  imple¬ 
mentations  on  dynamically  reconfigurable  hardware. 

We  apply  a  specific  form  of  dataflow  modeling  referred  to  as  parameterized 
synchronous  dataflow  (PSDF),  which  offers  valuable  properties  in  terms  of  model¬ 
ing  systems  with  dynamic  parameters,  supporting  efficient  scheduling  techniques, 
and  natural  integration  with  popular  SDF  modeling  techniques  [4] .  Compared  to 
enable-invoke  dataflow  [43],  PSDF  has  lower  expressive  power,  but  is  equipped 
with  streamlined  scheduling  techniques  for  the  subclass  of  application  models 
that  are  amenable  to  PSDF  semantics.  Compared  to  scenario-aware  dataflow  [51], 
PSDF  can  be  viewed  as  having  a  more  strict  separation  between  data  and  param¬ 
eters,  which  facilitates  symbolic  scheduling  techniques  based  on  parameterized 
looped  schedules. 

As  described  in  Section  2.3,  PSDF  is  based  on  parameterized  dataflow,  which 
is  a  meta-modeling  technique  that  can  significantly  improve  the  expressive  power 
of  an  arbitrary  dataflow  model  that  possesses  a  well-defined  concept  of  a  graph  it¬ 
eration  [3].  Parameterized  dataflow  provides  a  method  to  systematically  integrate 
dynamic  parameter  reconfiguration  into  such  models,  while  preserving  many  of 
the  original  properties  and  intuitive  characteristics  of  the  original  models.  The  in¬ 
tegration  of  the  parameterized  dataflow  meta-model  with  SDF  provides  the  model 
of  computation  that  we  refer  to  as  parameterized  synchronous  dataflow  (PSDF). 

Efficient  quasi-static  scheduling  techniques  have  been  demonstrated  previ¬ 
ously  for  PSDF  specifications  [4].  Here,  by  quasi-static  scheduling,  we  refer  to  a 
general  approach  to  scheduling  in  which  significant  portions  of  schedule  structure 
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are  fixed  at  compile  time,  while  some  amount  of  run-time  schedule  adjustment 
can  be  made  in  response  to  input  data  or  changes  in  operational  requirements. 

4.1.  Execution  ofPSDF  Graphs 

The  PSDF  model  allows  the  behavior  of  subsystems  to  be  controlled  by  sets  of 
parameters  that  can  be  configured  dynamically.  Such  parameter  control  is  mod¬ 
eled  by  mapping  selected  dataflow  graph  outputs  of  certain  graphs,  which  are 
dedicated  to  computing  parameter  updates,  to  parameters  of  the  graphs  that  they 
control.  This  coordination  between  parameter  update  computations  and  parame¬ 
ter  reconfiguration  operates  under  a  carefully  structured  framework  that  promotes 
predictability  and  efficiency.  Basic  concepts  associated  with  PSDF  modeling  and 
execution  are  outlined  as  follows. 

•  A  PSDF  subsystem  consists  three  distinct  PSDF  graphs,  called  the  init, 
submit  and  body  graphs. 

•  The  interface  dataflow  behavior  of  a  PSDF  subsystem  (i.e.,  the  rates  of  token 
production  and  consumption  at  the  subsystem  inputs  and  outputs)  can  only 
be  changed  only  by  the  init  graph. 

•  The  init  graph  can  configure  both  the  subinit  and  body  graphs. 

•  The  subinit  graph  is  allowed  to  configure  the  body  graph  but  not  allowed  to 
change  the  interface  dataflow  behavior  of  its  enclosing  subsystem. 

•  The  body  graph  is  executed  immediately  after  the  execution  of  the  subinit 
graph. 

•  A  hierarchical  PSDF  actor  encapsulates  a  PSDF  subsystem;  such  nesting  in 
terms  of  PSDF  semantic  hierarchy  an  be  arbitrarily  deep  based  on  how  a 
design  is  constructed. 

We  use  the  downsampler  shown  in  Figure  6  to  illustrate  these  concepts.  Here, 
node  H  represents  a  PSDF  downsampler  (i.e.,  a  subsystem),  which  has  two  pa¬ 
rameters,  the  factor  and  phase.  These  parameters  represent  the  consumption  rate 
F  and  the  index  P  of  the  token  that  is  selected  (for  transfer  to  the  output  port) 
among  the  F  tokens  that  are  consumed  in  a  single  execution  of  the  downsampler. 
Thus,  for  example,  if  F  =  5  and  P  =  2,  then  downsampling  by  a  factor  of  5 
is  performed,  and  on  each  execution,  the  downsampler  outputs  the  second  token 
from  among  the  window  of  5  tokens  that  it  consumes. 
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The  consumption  or  production  rate  associated  with  a  subsystem  input  or  out¬ 
put  port  is  viewed  as  interface  dataflow  behavior,  and  can  only  be  configured  by 
the  init  graph  (or  kept  fixed  at  a  statically  configured  value),  as  described  above. 
Thus,  the /actor  F  is  configured  by  the  init  graph  only  whereas  the  phase  P  can  be 
configured  either  by  the  init  or  subinit  graph  since  it  does  not  change  the  interface 
dataflow  behavior. 

In  this  example,  P  is  configured  by  the  subinit  graph  to  allow  a  finer  gran¬ 
ularity  of  (more  frequent)  control  compared  to  configuration  by  the  init  graph. 
Initially,  actor  E  configures  F  to  the  value  3,  which  yields  an  SDF  graph  that 
maintains  its  given  SDF  properties  while  the  parameter  F  remains  fixed  at  this 
value.  To  execute  the  graph  in  this  SDF  configuration,  we  can  apply  any  valid  SDF 
schedule  for  the  configuration  —  one  such  schedule  is  {3A)BHC.  This  schedule 
is  repeated  some  number  of  times  before  the  downsampler  value  is  changed.  In 
particular,  changes  must  occur  between  iterations  of  this  schedule,  as  governed  by 
PSDF  semantics  so  that  within  any  given  iteration  the  graph  operates  as  an  SDF 
graph,  while  the  SDF  graph  configuration  can  be  changed  between  iterations. 


Figure  6:  A  PSDF-based  downsampler  example. 

PSDF-based  design  and  implementation  is  supported  by  a  Java-based  PSDF 
simulator,  called  PSDFsim  [55],  which  provides  modeling  and  functional  sim¬ 
ulation  capabilities  for  PSDF  specifications  as  part  of  the  dataflow  interchange 
format  (DIF)  environment  [19]. 
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4.2.  PSDF  Design  Methodology 

PSDF-semantics  can  be  applied  for  model-based  design  at  the  front  end  of 
the  FPGA/ASIC  design  flow  shown  in  Figure  7.  Such  an  approach  provides  a 
structured  framework  for  modeling  adaptive  behaviors  and  dynamic  reconfigura¬ 
tion,  and  deriving  corresponding  adaptations  to  scheduling  strategies  and  resource 
allocations  (e.g.,  see  [4,  55]). 

Our  PSDF-based  approach  for  FPGA  system  design  involves  two  key  phases 
—  high  level  modeling  and  validation  (modeling)  and  hardware  architecture  map¬ 
ping  (mapping).  These  two  phases  can  in  general  be  applied  iteratively  to  imple¬ 
ment  and  experimentally  refine  dataflow  based  parallel  processing  structures  for 
FPGA-  or  ASIC -based  signal  processing  systems. 

The  modeling  phase  ensures  correct  application  functionality  as  well  as  the 
correct  formulation  of  the  functionality  in  terms  of  dataflow  and  PSDF  princi¬ 
ples.  Through  its  direct  connection  to  the  concurrency  modeling  capabilities  of 
dataflow,  this  phase  helps  provide  a  framework  for  efficient  implementation  even 
though  the  focus  on  this  phase  is  on  functional  validation  rather  than  detailed 
hardware  mapping.  In  this  phase,  procedural  software  code  is  used  to  specify  the 
internal  functionality  of  the  actors,  while  a  dataflow  language  is  used  to  specify 
the  high-level  (inter-actor)  application  model.  In  PSDFsim,  the  Java  and  DIF  lan¬ 
guages  are  used  for  these  purposes  of  intra-actor  and  inter-actor,  modeling-phase 
specification,  respectively. 

In  the  mapping  phase,  the  designer  applies  the  individual  actor  models  as  func¬ 
tional  references  to  derive  corresponding  hardware  implementations  using  a  hard¬ 
ware  description  language  (HDL).  The  functionality  of  these  “hardware  actors” 
can  be  validated  using  the  same  testbenches  as  those  used  in  the  modeling  phase. 
Use  of  the  formal  dataflow  methodology  to  encapsulate  design  components  (ac¬ 
tors)  facilitates  this  reuse  of  testbenches.  Similarly,  edges  in  the  DIF-based  ap¬ 
plication  model  are  mapped  into  corresponding  FIFO  implementations  using  the 
targeted  HDL  and  associated  design  library. 

By  developing  the  actors  based  on  PSDF  principles,  and  connecting  them 
through  standard  FIFO  semantics,  functional  correctness  of  the  overall,  application- 
level  hardware  implementation  follows  directly  from  correctness  of  the  original 
PSDF  application  model,  and  correct  mappings  of  the  individual  actor  models  into 
hardware.  Additionally,  the  application  level  model  from  the  modeling  phase  can 
be  used  as  a  testbench  to  begin  application-level  testing  of  the  hardware,  where 
both  functional  and  timing  constraints  must  be  taken  into  account.  Insight  from 
timing  analysis  of  the  hardware  implementation  can  then  be  used  to  optimize  the 
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hardware  actors  and  possibly  to  iterate  back  to  the  modeling  phase  to  explore 
refinements  or  alternatives  to  the  high  level  dataflow  architecture. 


Figure  7:  FPGA/ASIC  design  flow  overview. 

The  simulation  and  implementation  tools  discussed  in  Section  3  focus  on  map¬ 
ping  hardware  description  language  (HDL)  programs  into  FPGA  implementa¬ 
tions.  In  contrast,  our  proposed  methods  show  how  to  map  higher  level,  model 
based  specifications  into  monolithic,  FPGA-targeted,  HDL  programs,  which  can 
then  be  further  processed  by  tools  such  as  those  discussed  in  Section  3.  In  our 
experiments,  we  have  not  integrated  the  tools  discussed  in  Section  3  in  this  way 
(i.e.,  as  a  back  end  to  our  proposed  methods);  this  is  a  useful  direction  for  further 
study. 

5.  Hardware  Mapping 

In  this  section,  we  present  two  hardware  architecture  mapping  methods  that 
apply  PSDF  modeling,  and  support  dynamic  reconfiguration  with  two  useful  ob¬ 
jectives  —  modularity  and  performance  optimization.  Both  of  these  methods  ex¬ 
ploit  the  modular  design  representation  format  facilitated  by  PSDF,  which  is  dis¬ 
cussed  in  Section  5.1.  We  present  a  novel  form  of  co-design  between  PSDF  appli¬ 
cation  modeling  and  scheduling  in  Section  5.2,  and  we  demonstrate  the  utility  of 
this  approach  in  deriving  efficient,  model-based  implementations  of  dynamically 
reconfigurable  signal  processing  systems. 
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5.1.  Modular  Mapping 

We  develop  a  systematie  approaeh  for  mapping  PSDF  speeifieations  into  hard¬ 
ware  implementations.  Beeause  of  natural  eorrespondenees  that  are  used  between 
dataflow  design  objeets  (aetors  and  edges)  and  eorresponding  hardware  strue- 
tures,  as  well  as  between  speeialized  PSDF  modeling  features  and  their  imple¬ 
mentations,  the  approaeh  provides  a  high  degree  of  modularity.  In  this  modu¬ 
lar  approaeh,  implementations  are  eomposed  in  terms  of  smaller  building  bloeks 
that  ean  be  tested  independently  and  integrated  preeisely  through  our  mapping  of 
PSDF  semanties  into  hardware  eontrol. 

Previous  work  on  mapping  dataflow  struetures  into  hardware  inelude  the  work 
on  VLSI  dataflow  arrays  [23],  multidimensional  arrayed  dataflow  [34],  and  Sys- 
temC  [17].  The  methods  developed  in  this  paper  are  different  from  these  ap- 
proaehes  in  their  support  for  parameterized  dataflow  modeling,  and  the  novel  fea¬ 
tures  of  dynamie  parameter  reeonfiguration  and  reeonfigurable  dataflow  modeling 
that  are  provided  by  PSDF  semanties  [4,  3].  Due  to  the  potential  for  applying  pa¬ 
rameterized  dataflow  semanties  with  arbitrary  dataflow  models  of  eomputation 
(subjeet  to  suitable  definitions  of  graph  iterations),  the  integration  of  the  teeh- 
niques  presented  in  this  paper  with  the  models  used  in  the  aforementioned  works 
is  an  interesting  direetion  for  further  study. 

PSDF  and  PSDFsim  modeling  eonstruets  —  in  partieular,  PSDF  aetors,  edges, 
sehedules,  parameter  propagation  paths,  and  operational  semanties  —  map  natu¬ 
rally  into  eorresponding  hardware  struetures.  Table  1  summarizes  our  methodol¬ 
ogy  for  deriving  sueh  mappings. 

Table  1:  Mapping  PSDF  constructs  to  hardware. 


PSDF  Modeling  Components 

Hardware  Components 

aetor 

edge 

sehedule 

parameter  propagation  path 
operational  semanties 

eireuit  bloek 
buffer  (e.g.,  FIFO) 
graph  eontroller 
wire 

subsystem  eontroller 

Although  the  eomplexity  of  eireuit  bloeks  ean  vary  widely,  the  top-down  appli- 
eation  of  PSDF  prineiples  provides  a  standardized  design  style  for  the  interaetion 
between  different  eireuit  bloeks  and  for  the  interaetion  between  eireuit  bloeks  and 
the  assoeiated  eontrol  for  seheduling  and  parameter  management  for  the  bloeks. 
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This  allows  for  significant  reuse  of  parameterized  HDL  “glue  code”,  as  well  as 
corresponding  streamlining  of  verification  effort. 

We  employ  self-timed  scheduling  and  control  of  dataflow  actors  within  a  PSDF 
context.  In  a  such  a  self-timed  approach,  actors  can  fire  as  soon  as  they  have  suffi¬ 
cient  data  on  their  input  ports,  have  access  to  sufficient  empty  buffer  slots  on  their 
output  ports,  and  have  their  current  parameter  values  available,  as  determined  by 
the  associated  subinit  and  init  graphs.  Such  self-timed  hardware  mapping  is  natu¬ 
ral  for  signal  processing  oriented  dataflow  models  of  computation  (e.g.,  see  [7]). 
The  modularity  of  the  approach  is  enhanced  because  the  controllers  for  individ¬ 
ual  actors  are  structured  independently  of  any  global  scheduling  control,  which 
allows  scheduling  strategies  to  be  changed  efficiently,  conveniently,  and  reliably. 
In  this  approach,  only  loop  counts  associated  with  actor  control  vary  with  changes 
in  the  schedule  control,  and  such  adaptation  of  loop  counts  can  be  carried  out  nat¬ 
urally  as  actor  parameter  updates  through  the  overall  framework  of  parameterized 
dataflow. 

Figure  8  illustrates  the  architecture  of  a  standard  wrapper  for  PSDF-based 
interfacing  of  actor  circuit  blocks.  Here,  the  blocks  labeled  counter,  controller, 
and  loop  count  handle  control  and  iteration  management  within  the  functional 
unit  of  the  actor,  which  can  be  of  arbitrary  complexity.  The  blocks  labeled  cons 
circuit  and  prod  circuit  handle  input  and  output  interfacing  of  the  actor  based  on 
dataflow  rates  that  may  be  parameterized  and  dynamically  configured. 

The  structure  of  hardware  mapping  at  the  PSDF  subsystem  level  is  illustrated 
in  Figure  9.  The  controllers  associated  with  the  structures  of  Figure  8  and  Figure  9 
are  illustrated  in  Figure  10.  In  comparison  with  the  circuit  block,  the  other  hard¬ 
ware  components  are  relatively  less  complicated,  and  to  provide  flexibility,  we  do 
not  constrain  the  implementations  of  these  components  to  any  particular  styles. 
For  example,  FIFOs  can  be  implemented  using  D  Flip-Flops  or  SRAMs.  The  sub¬ 
system  controllers  and  graph  controllers  (i.e.,  the  controllers  for  the  init,  subinit 
and  body  graphs)  follow  PSDF  operational  semantics  and  the  generated  schedule 
to  guide  execution  of  the  graph  controllers  and  the  circuit  blocks,  respectively. 
These  two  types  of  controllers  are  finite  state  machines  in  which  control  remains 
in  a  given  state  while  there  is  no  triggering  input.  For  instance,  a  graph  controller 
remains  in  the  EXE  state  until  the  corresponding  circuit  block  completes  execu¬ 
tion. 

The  circuit  block  control,  illustrated  in  Figure  10(a),  is  a  key  part  of  our  pro¬ 
posed  method  for  self-timed,  PSDF  hardware  implementation.  Such  a  circuit 
block  control  structure  provides  control  for  an  individual  PSDF  actor.  At  the 
beginning  of  a  control  iteration  (the  state  labeled  PARAM),  the  circuit  block  con- 
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figures  any  dynamically  managed  parameters  based  on  the  current  settings  and 
attempts  to  consume  data  from  the  actor  input  port.  The  controller  will  block  in 
the  CONS  state  until  all  data  has  arrived  from  the  corresponding  producer  actor, 
and  has  been  consumed  for  processing  by  the  circuit  block.  Then  the  controller 
enters  the  EXE  state  and  activates  the  encapsulated  functional  unit  to  process  the 
input  data  and  generate  any  output  values.  When  the  output  data  is  ready,  the 
prod  circuit  pushes  the  output  data  onto  the  corresponding  output  edges  in  the 
PROD  state.  Finally,  after  all  output  data  has  been  written,  the  controller  enters 
the  DONE  state.  In  the  DONE  state,  if  the  firing  count  within  the  current  loop  exe¬ 
cution  matches  the  loop  count,  then  the  controller  transitions  back  to  the  PARAM 
state  and  waits  for  another  circuit  block  iteration  before  proceeding;  otherwise, 
the  controller  transitions  to  the  CONS  state  to  consume  tokens  for  the  next  firing. 


Circuit  Block 


Figure  8;  Interface  and  control  architecture  for  a  circuit  block. 

Based  on  the  modularity  of  our  hardware  mapping  approach,  described  earlier, 
the  control  structures  for  actors  need  to  be  configured  only  by  setting  their  respec¬ 
tive  loop  counts.  Such  loop  counts  can  be  derived  and  validated  efficiently  at  the 
functional  prototyping  stage,  using  the  PSDFsim  tool  in  conjunction  with  tech¬ 
niques  to  determine  repetitions  vectors  of  specific  SDF  configurations  [55].  The 
repetitions  vector  of  an  SDF  graph  gives  the  number  of  times  each  actor  needs  to 
be  fired  in  a  periodic  schedule  for  the  graph  [26]. 

Thus,  the  overall  design  flow  involves  applying  PSDFsim  for  functional  pro¬ 
totyping,  performing  systematic  hardware  mapping  using  the  approach  described 
in  this  section,  and  then  synthesizing  and  deploying  the  resulting  self-timed  im¬ 
plementation  on  the  target  FPGA. 

5.2.  Schedule-Based  Mapping 

Effective  scheduling  is  important  in  deriving  efficient  implementations  of  dy¬ 
namically  reconfigurable  signal  processing  systems.  However,  scheduling  in  the 


20 


Figure  9;  An  illustration  of  subsystem-level  hardware  mapping. 


(a)  (b)  (c)  (d) 

Figure  10:  Finite  state  machines  for  (a)  a  circuit  block,  (b)  a  graph  controller,  (c)  consumption  and 
production  circuits,  and  (d)  a  subsystem  controller. 
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presence  of  dynamic  reconfiguration  is  challenging  because  of  the  increased  dy¬ 
namics  in  the  scheduling  process,  as  well  as  the  increased  difficulty  in  modeling 
and  manipulating  schedules  whose  structures  change  during  execution. 

The  dataflow  schedule  graph  (DSG),  proposed  in  [57],  is  a  dataflow  based 
schedule  representation  that  helps  to  address  these  challenges.  The  DSG  can  be 
viewed  as  a  model  for  representing  schedules  of  dataflow  graphs  that  is  itself 
rooted  in  dataflow  semantics.  This  allows  an  integrated  modeling  approach  among 
applications,  schedules,  and  their  interactions. 

DSG-based  modeling  of  actors  centers  around  two  special  kinds  of  actors, 
called  schedule  control  actors  (SCAs)  and  reference  actors  (RAs).  A  DSG  for  a 
given  processor  (i.e.,  the  DSG-based  schedule  model  for  a  given  processor)  can 
have  at  most  one  token  at  any  given  time  within  the  graph.  This  token  serves 
to  enable  the  next  actor  to  be  executed  on  the  processor.  The  restriction  that  the 
total  token  count  be  bounded  by  1  enforces  the  constraint  that  a  processor  can 
execute  at  most  one  task  at  any  given  time.  Here,  a  “processor”  can  represent 
any  computational  resource  that  executes  actors  in  a  dedicated  (exactly  one  actor 
assigned  to  the  resource)  or  time-multiplexed  manner. 

In  contrast  to  conventional  dataflow  actors,  which  represent  functional  com¬ 
ponents  from  the  original  application  specification  (application  actors),  SCAs  are 
dataflow  actors  that  are  dedicated  to  coordinating  control  flow  in  derived  sched¬ 
ules.  On  the  other  hand,  RAs  can  be  viewed  as  “pointers”  to  application  actors. 
These  pointers  are  equipped  with  optional  auxiliary  computations.  Intuitively,  an 
RA  represents  a  scheduling  “wrapper”  that  specifies  the  computation  that  is  ex¬ 
ecuted  when  the  corresponding  actor  is  “visited”  during  schedule  execution.  A 
basic  form  of  RA  is  one  that  simply  performs  a  guarded  execution  of  the  actor 
that  it  points  to.  A  guarded  execution  of  an  actor  does  nothing  if  the  actor  does 
not  have  sufficient  data  on  its  inputs  to  complete  its  next  firing;  if  sufficient  in¬ 
put  data  is  available,  a  guarded  execution  executes  a  single  firing  of  the  actor. 
However,  more  capabilities  —  beyond  just  performing  guarded  executions  —  can 
be  incorporated  into  RAs  using  the  optional  auxiliary  computations  mentioned 
above. 

Table  2  gives  examples  of  several  types  of  SCAs  and  summarizes  properties 
of  these  actors.  The  loop  actor  has  two  pairs  of  inputs  and  outputs.  One  pair  is 
used  to  perform  computations  within  the  loop  repeatedly,  while  the  other  pair  is 
used  for  conditionally  branching  into  and  exiting  the  loop  based  on  certain  control 
conditions.  Since  there  is  only  one  token  in  the  enclosing  DSG,  execution  always 
proceeds  unambiguously  either  inside  or  outside  the  loop. 

SCA  actors  can  be  paired  with  other  SCA  actors  to  provide  special  control 
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functions  that  involve  their  coordination.  For  example,  case  and  endcase  provide 
DSGs  with  the  capability  of  selecting  computations  conditionally.  The  number 
of  outputs  for  a  given  case  actor  must  match  the  number  of  inputs  to  the  corre¬ 
sponding  endcase  actor  to  provide  conditional  selection  of  the  computations  that 
are  enclosed  by  the  matching  case  and  endcase  pair. 


Table  2:  Examples  of  SCAs. 


SCA 

#  of  inputs 

#  of  outputs 

loop 

2 

2 

case 

1 

>2 

endcase 

>2 

1 

As  described  earlier,  tokens  that  flow  along  edges  of  the  DSG  serve  to  enable 
actors  for  execution  (as  it  becomes  their  turn  to  execute).  DSG  tokens  can  also 
contain  values  that  are  manipulated  and  queried  during  execution  of  the  DSG  to 
achieve  various  forms  of  data-  or  parameter-dependent  schedule  control. 

The  execution  of  a  PSDF  specification  involves  careful  coordination,  based  on 
details  of  PSDF  semantics,  among  the  init,  subinit,  and  body  graphs  within  the  re- 
configurable  subsystems  that  are  enclosed  by  the  specification.  By  modeling  this 
PSDF-driven  coordination  in  terms  of  DSGs,  we  can  precisely  represent  PSDF 
execution  (i.e.,  the  operational  semantics  of  PSDF)  in  terms  of  pure  dataflow  con¬ 
cepts,  thereby  enabling  analysis  and  manipulation  of  schedules  based  on  dataflow 
techniques  rather  than  having  to  rely  on  specialized  PSDF-based  methods.  More¬ 
over,  such  a  PSDF  to  DSG  transformation  allows  PSDF  graphs  to  be  implemented 
through  reuse  of  dataflow  techniques  rather  than  through  specialized  implemen¬ 
tation  structures  that  are  derived  for  PSDF.  Such  a  transformation  thus  combines 
the  high  level  modeling  flexibility  and  analysis  potential  offered  by  PSDF  with 
streamlined  paths  to  implementation  offered  through  the  use  of  the  DSG  as  an 
intermediate  representation. 

A  general  DSG  model  for  PSDF  execution  is  illustrated  in  Figure  11.  Param¬ 
eter  reconfiguration  is  achieved  through  communication  between  RAs  and  appli¬ 
cation  actors  through  DSG  tokens  that  encapsulate  updated  parameter  values.  For 
example,  if  the  init  graph  changes  the  value  of  a  parameter  associated  with  a  body 
graph  actor  A,  the  DSG  token  can  “carry”  this  value  (e.g.,  within  a  list  of  pending 
parameter  updates)  to  A  for  reconfiguration.  Once  the  body  graph  is  executed, 
and  the  DSG  token  “reaches”  the  RA  that  encapsulates  A,  the  parameter  update 
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can  be  “unpacked”  from  the  DSG  token  and  applied  to  A  before  A  executes.  A 
similar  approach  can  be  used  to  achieve  parameter  control  of  the  subinit  graph  by 
the  init  graph,  and  of  the  body  graph  by  the  subinit  graph. 

Thus,  using  the  DSG  model,  processes  associated  with  dynamic  reconfigura¬ 
tion  can  be  abstracted  in  a  way  that  precisely  and  flexibly  represents  the  relevant 
functionality  while  hiding  platform-specific  implementation  details  about  how  the 
reconfiguration  is  achieved.  For  example,  whether  the  init  graph  stores  parameter 
values  in  registers  or  memory,  and  how  the  DSG  token  “points  to”  such  storage 
locations  to  reference  the  associated  configuration  settings,  are  left  as  implemen¬ 
tation  details  that  can  be  refined  from  the  DSG-based  model. 
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Figure  11:  Modeling  PSDF  execution  using  DSGs. 

DSGs  can  be  applied  not  only  as  a  target  for  automated  schedule  generation 
techniques,  but  also  as  a  model  in  which  designers  specify,  experiment  with,  and 
iteratively  refine  schedules.  Because  they  are  rooted  in  familiar  dataflow  princi¬ 
ples,  rather  than  specialized  or  esoteric  scheduling  notations,  designers  can  work 
with  DSG  representations  using  well  understood  modeling  concepts.  In  such  a 
way,  designers  can  experiment  flexibly  with  schedules  that  interact  with  applica¬ 
tion  actors,  and  apply  platform-based  features  for  dynamic  reconfiguration.  By 
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providing  such  flexibility  and  facilitating  such  experimentation,  DSG-based  hard¬ 
ware  mapping  of  PSDF  graphs  can  help  designers  explore  complex  design  spaces 
associated  with  dynamically  reconfigurable  signal  processing  systems,  and  tai¬ 
lor  implementations  based  on  the  specific  implementation  constraints  for  a  given 
application. 

In  the  following  section,  we  explore  this  integrated  PSDF-  and  DSG-driven 
design  methodology  using  two  case  studies  that  involve  relevant  signal  processing 
applications. 

6.  Case  Studies 

In  this  section,  we  present  two  case  studies  with  which  we  concretely  demon¬ 
strate  our  proposed  methods  for  model-based  implementation  of  dynamically  re- 
configurable  signal  processing  systems. 

6.1.  Reconfigurable  Phase- shift  Keying 

First,  we  demonstrate  our  PSDF-based  design  methodology  and  modular  hard¬ 
ware  mapping  techniques  using  a  reconfigurable  phase-shift  keying  (PSK)  appli¬ 
cation  that  can  be  configured  as  binary  PSK  (BPSK),  quadrature  PSK  (QPSK) 
or  8PSK.  We  construct  PSDF  models  of  the  modulator  and  demodulator  for  this 
system,  and  develop  Java-based  functional  DIF  code  to  specify  the  internal  func¬ 
tionality  of  each  actor.  The  resulting  PSDF  program  is  simulated  and  tested  using 
PSDFsim,  and  then  hardware  mapping  is  applied  to  the  modulator  to  derive  a  Ver- 
ilog  implementation.  HDL  simulation  and  synthesis  is  then  applied  to  validate  the 
derived  hardware. 

Figure  12  illustrates  our  PSDF  model  of  the  targeted  system  for  reconfigurable 
PSK.  Here,  D  represents  an  input  interface  that  injects  samples  from  the  incoming 
data  stream  into  the  dataflow  graph;  T  and  P  are  parameterized  lookup  tables;  II 
is  an  actor  that  configures  the  consumption  rate  (based  on  M)  of  T;  S2  and  S4 
provide  trigonometric  functions  that  are  selected  based  on  a  dynamic  parameter 
setting;  IS  configures  the  production  rate  of  P;  A  is  an  adder;  XI 2  and  X34 
are  constant  multipliers  whose  associated  constants  (scaling  factors)  are  managed 
as  dynamic  parameters;  and  B  is  an  output  interface  for  the  storing  or  further 
processing  of  the  resulting  binary  sequence.  The  input  interface  D  makes  two 
copies  of  each  input  token  on  its  output  since  two  separate  multiplications  are 
required  for  each  input  sample. 

Our  PSDF  model  involves  a  parameter  M,  which  determines  which  form  of 
PSK  to  employ.  For  M  =  1,  2,  3,  an  SDF  graph  associated  with  BPSK,  QPSK  and 
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(a)  PSK  modulator. 
H3 


(b)  PSK  demodulator. 

Figure  12:  PSDF-based  models  of  PSK  modulator  and  demodulator. 
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8PSK,  respectively,  is  effectively  activated.  After  the  system  model  is  constructed, 
we  use  PSDFsim  to  simulate  the  system  and  validate  the  functionality  for  the 
different  values  of  M.  This  initial  simulation  is  performed  assuming  no  distortion 
of  data  in  the  channel. 

Since  channel  quality  is  critical  to  the  choice  of  PSK,  we  can  modify  actor  C  to 
model  the  noise  in  the  channel,  and  analyze  the  simulation  results  under  different 
PSK  configurations.  PSDFsim  enables  such  multi-mode  application  simulation  to 
be  executed  in  an  integrated  manner  —  i.e.,  as  a  single  simulation  that  includes  all 
PSK  configurations  along  with  simulation  control  functionality  that  dynamically 
changes  the  configuration. 

Our  hardware  mapping  of  the  modulator  is  illustrated  in  Figure  13.  Here,  the 
filler  block  represents  an  actor  that  is  inserted  to  help  maintain  PSDF  operational 
semantics.  Since  the  init  and  subinit  graphs  here  both  contain  one  node  each,  their 
associated  graph  controllers  can  be  removed.  Note  also  that  the  circuit  blocks 
associated  with  blocks  T  and  XI 2  are  parameterized  and  receive  parameter  value 
updates  from  circuit  blocks  II  and  S2. 


Figure  13:  Hardware  mapping  for  modulator. 

To  provide  an  area  comparison,  we  instantiate  three  separate  PSK  circuits  that 
support  BPSK,  QPSK  and  8PSK  individually  using  SDF-based  models.  We  com¬ 
pare  this  pure-SDF-based  implementation  with  our  PSDF  implementation,  which 
is  derived  using  PSDFsim  and  our  proposed  design  methodology.  Synthesis  re- 
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suits  generated  by  the  Cadence  Encounter  RTL  Compiler  are  shown  in  Table  3. 
Although  there  is  some  area  overhead  in  the  PSDF  implementation  due  to  the  con¬ 
trollers  and  auxiliary  circuits  used  for  the  init  and  subinit  graphs,  this  overhead  is 
more  than  compensated  for  by  the  hardware  reuse  that  is  facilitated  by  the  flexible, 
dynamic  parameterization  capabilities  of  PSDF. 

Table  3:  Comparisons  for  PSK  modulator  system. 


Area  of  PSDF  design  and  SDF  design 
(modular  hardware  mapping) 

PSDF  (cell) 
20004 

SDF  (cell) 
33602 

Reduction 
40.47%  (1.68X) 

This  modular  hardware  mapping  approach  is  readily  applied  due  to  its  gener¬ 
ality,  and  is  also  useful  as  it  provides  a  standard  method  to  realize  hardware  im¬ 
plementations  of  PSDF  graphs.  Our  schedule-based  hardware  architecture  map¬ 
ping  approach  using  DSGs  provides  a  complementary  method,  which  can  be  used 
(e.g.,  in  later  stages  of  the  design  process)  to  specialize  the  hardware  mapping  for 
a  specific  application,  and  capture  the  structure  of  such  specialized  mappings  in 
an  abstract  form  that  can  be  targeted  subsequently  to  platform- specific,  hardware 
control  structures. 
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From  its  formal,  dataflow-based  structure,  the  DSG  is  well-suited  for  transfor¬ 
mation  into  optimized  finite  state  machine  (FSM)  structures  that  provide  control 
logic  for  hardware  implementation  of  the  associated  schedules.  Figure  15(b)  il¬ 
lustrates  a  DSG  representation  for  the  reconfigurable  PSK  application,  along  with 
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an  FSM  that  is  derived  from  the  DSG.  Most  of  the  states  map  to  distinct  RAs,  and 
execute  the  functionality  associated  with  the  associated  RAs.  Since  the  loop  iter¬ 
ation  count  of  I00P2  is  fixed,  the  state  Rs2  is  designed  to  implement  loop  control 
as  well  as  firing  the  actor  S2. 

In  our  experiments  with  schedule-based  hardware  mapping,  we  targeted  ASIC 
implementation  using  the  Cadence  Encounter  RTL  Compiler  for  back-end  syn¬ 
thesis.  The  results  reported  here  are  synthesis  results  only  (the  design  was  tested 
thoroughly  but  not  actually  fabricated).  Table  4  shows  the  improvement  in  area 
that  is  achieved  by  the  streamlined  DSG  representation  compared  to  the  modular 
PSDF-to-hardware  mapping  approach  of  Section  5.1.  This  improvement  is  ac¬ 
companied  by  a  formal,  dataflow  based  representation  of  schedule  logic,  which 
can  be  retargeted  systematically  to  other  types  of  platforms  for  rapid  prototyping 
and  experimentation  with  platform-specific  implementation  trade-offs. 

Table  4:  Area  comparison  for  reconfigurable  PSK  modulator  under  constant  speed  (100  MHz). 


Hardware  mapping 

schedule-based 

modular 

Reduction 

Area  (cell) 

18949 

20004 

5.27% 

6.2.  Foreground/Background  Extraction 

Video  surveillance  is  widely  used  for  security  enhancement  and  environmental 
monitoring.  As  the  demand  for  video  surveillance  increases,  the  volume  of  data 
that  must  be  analyzed  for  surveillance  applications  increases  dramatically  as  well. 
Pattern  recognition  helps  to  incorporate  automation  in  this  analysis  process,  and 
make  it  practical  with  limited  human  resources  for  monitoring  surveillance  data. 

To  be  effective,  pattern  recognition  techniques  are  often  task  specific  with  sig¬ 
nificant  fine  tuning  of  system  configurations  and  algorithm  parameters  in  terms  of 
the  kinds  of  data  being  analyzed  and  the  objectives  of  the  analysis.  At  the  same 
time,  the  vast  amount  of  data  that  needs  to  be  processed  in  typical  applications  can 
make  software-based  implementation  (e.g.,  using  MATLAB  and  C)  impractical. 

Considering  these  two  issues  —  the  need  for  task-specific  tuning  and  high 
performance  —  PSDF  mapping  to  FPGAs  provides  a  potential  solution  method, 
where  both  parameter  adaptation  and  high  performance  hardware  mapping  can  be 
supported  and  optimized  through  an  integrated  design  process. 
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D 


(a)  A  DSG  for  the  reconfiguration  PSK  modulator  of  Figure  14. 
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(b)  An  FSM  for  the  DSG  in  Figure  15(a). 


Figure  15;  Hardware  architecture  mapping  for  a  DSG. 
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In  a  workload  analysis  study  of  video  surveillance  systems,  it  has  been  shown 
that  the  most  expensive  computation  is  foreground/background  (FG/BG)  extrac¬ 
tion  [10].  We  demonstrate  a  general  PSDF  model  of  FG/BG  extraction  that  can 
accommodate  a  variety  of  FG/BG  extraction  algorithms.  This  model  is  shown  in 
Figure  16. 

The  FG/BG  extraction  algorithms  represented  by  this  model  generally  involve 
two  phases  —  training  and  differentiating.  In  the  training  phase,  the  construction 
of  the  BG  model  is  based  on  features  extracted  from  a  set  of  training  frames.  This 
model  construction  process  involves  determining  appropriate  threshold  values  for 
pixels.  Then,  in  the  differentiating  phase,  the  BG  model  is  applied  to  recognize  the 
foreground  —  if  a  given  pixel  of  the  current  frame  exceeds  the  associated  thresh¬ 
old  value,  it  is  recognized  as  a  foreground  pixel;  otherwise,  it  is  recognized  as  a 
background  pixel.  The  training  methods  and  threshold  values  vary  with  different 
algorithms  and  applications,  and  careful  tuning  of  these  key  aspects  is  typically 
important  to  achieve  high  accuracy  [42]. 

Our  PSDF  model  shown  in  Figure  16  contains  two  subsystems,  which  are 
used  to  specify  algorithms  for  video  preprocessing  and  FG/BG  extraction.  The 
video  preprocessing  subsystem  here  can  be  viewed  as  an  auxiliary  subsystem, 
which  processes  raw  data,  and  transforms  it  into  a  form  that  is  appropriate  for  the 
extraction  algorithms  and  the  underlying  processing  platforms.  In  subsystem  2, 
actor  switch  passes  video  frames  to  actor  BG_model  and  current_f  rame 
in  the  training  phase.  At  that  point,  actor  FG_extractor  produces  no  fore¬ 
ground.  In  the  differentiating  phase,  actor  switch  stops  sending  frames  to  ac¬ 
tor  BG_model,  and  continues  to  send  frames  to  actor  current_f rame.  If 
a  pixel  of  the  current  frame  exceeds  the  corresponding  threshold  value,  actor 
FG_extractor  indicates  that  the  pixel  is  part  of  the  foreground. 


subsystem  2 


Figure  16:  A  general  PSDF  model  for  FG/BG  extraction. 
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We  apply  a  specific  FG/BG  extraction  algorithm  —  the  running  average  al¬ 
gorithm  —  using  the  general  PSDF  model  of  Figure  16.  The  running  average 
algorithm  averages  the  values  of  the  pixels  in  the  training  frames  to  create  the  BG 
model.  The  target  implementation  platform  for  our  experiments  with  the  running 
average  algorithm  is  the  Xilinx  Spartan  3E  Starter  (XC3S500E).  RS232  and  VGA 
ports  are  selected  as  the  input  and  output  interfaces,  respectively. 

Since  the  total  capacity  of  block  RAM  (BRAM)  on  the  target  platform  is  only 
360K  bits,  only  one  monochrome  640X480  (307,200  bits)  frame  can  be  accom¬ 
modated.  Here,  every  bit  represents  one  pixel  and  the  difference  between  the  same 
pixel  of  two  frames  is  either  0  or  1.  Eor  natural  mapping  from  our  general  PSDE 
model  of  EG/BG  extraction,  we  need  one  storage  subsystem  for  the  BG  model 
and  another  other  one  for  the  current  frame. 

In  this  implementation  of  the  model  in  Eigure  16,  actors  BG_model  and 
current_f  rame  “own”  their  associated  image  storage  and  are  controlled  by 
actor  switch.  Actor  FG_extractor  takes  two  frames  from  actor  BG_model 
and  cur  rent_f  rame  for  differentiation  and  determines  which  parts  of  the  frame 
are  foreground.  A  non-uniform  array  of  threshold  values,  shown  in  Eigure  17,  is 
derived  from  the  training  processes.  The  actual  threshold  values  represented  in 
Eigure  17  are  regarded  as  parameters,  which  can  be  adapted  based  on  video  stream 
characteristics. 

In  this  thresholding  approach,  multiple  pixels  are  grouped  into  individual  blocks, 
and  the  sum  (number  of  1 -valued  pixels)  in  a  block  is  computed  to  characterize 
the  block  and  compare  it  with  the  corresponding  block-based  threshold.  These 
block-based  thresholds  characterize  entire  blocks  with  a  single  operations  —  that 
is,  the  entire  block  is  characterized  as  foreground  and  background  based  on  the 
associated  threshold  comparison.  Eor  example,  if  the  pixel  sum  associated  with 
block  2  is  larger  than  t2,  the  block  is  classified  as  being  part  of  the  foreground. 
Eor  more  detailed  background  on  this  thresholding  approach,  we  refer  the  reader 
to  [25]. 

In  our  experiments,  the  threshold  values  are  derived  from  the  process  of  BG 
model  training  and  stored  in  actor  BG_model.  One  block  is  composed  of  eight 
pixels.  The  parameters  of  subsystem  2  are  summarized  as  follows. 

•  Init  graph: 

1 .  baud  rate  of  RS232  receiver, 

2.  number  of  frames  for  BG_model  training; 

•  subinit  graph: 
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1.  switching  on/off  the  path  from  actor  switch  to  BG_model, 

2.  threshold  value  of  discrimination  between  FG  and  BG. 
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Figure  17:  A  non-uniform  array  of  threshold  values. 

The  overall  implementation  involves  heterogeneous  design  languages  and  plat¬ 
forms  —  subsystem  1  is  implemented  in  MATLAB  and  executes  on  a  host  PC, 
subsystem  2  is  implemented  in  Verilog  and  executes  on  the  targeted  FPGA,  and 
the  dataflow  edge  between  the  video_preprocessing  and  switch  compo¬ 
nents  represents  the  RS232  channel,  where  the  transmitter  and  receiver  are  in  the 
video_preprocessing  and  switch  components,  respectively.  The  baud 
rates  of  the  transmitter  and  receiver  should  be  consistent.  The 
video_preprocessing  component  selects  frames  and  converts  them  into 
monochrome  format  based  on  luminance  levels.  The  modular  hardware  mapping 
process  developed  in  Section  5  is  adopted  throughout  the  implementation  process. 
The  parameters  of  subsystem  1  are  summarized  as  follows. 

•  Init  graph: 

1 .  baud  rate  of  RS232  transmitter, 

2.  luminance  level; 

•  subinit  graph: 

1.  frame  selection. 

Figure  18  shows  our  experimental  setup.  We  use  MATLAB  to  read  the  video 
from  files  on  the  host  platform,  and  divide  the  video  data  into  frames.  To  reduce 
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the  level  of  serial  eommunication,  we  select  every  tenth  frame  and  convert  it  to 
monochrome  format  based  on  a  luminance  level  threshold  of  0.3.  The  threshold 
value  is  set  to  4,  which  was  the  value  that  we  obtained  from  the  training  process 
described  in  Section  6. 


Foreground 
extraction  result 


Xilinx  FPGA 
Spartan  3E 
Starter 


Background  model 


VGA  output 


COM1  (RS232) 


Figure  18:  Foreground/background  extraction  on  FPGA,  and  associated  experimental  setup. 

Results  from  our  experimentation  are  illustrated  in  Figure  19  and  Figure  20. 
Figure  19  shows  six  frames  from  a  video  sequence.  Figure  19  (a)  shows  the  com¬ 
mon  background  scene  for  the  sequence,  while  Figures  19  (b-f)  show  frames  in 
which  a  man  runs  from  left  to  right.  These  six  frames,  after  processing  by  our 
implementation  of  foreground/background  extraction  on  the  FPGA,  are  shown  in 
Figures  20  (a-f),  respectively.  Here,  the  red  rectangle  indicates  the  subtracted 
foreground.  The  existence  of  black  pixels  outside  of  the  subtracted  foreground  is 
due  to  the  perturbation  of  trees  caused  from  ambient  breeze.  Distortion  from  this 
phenomenon  can  be  reduced  by  a  sophisticated  algorithm  [29] .  Our  results  shown 
in  Figures  20  (a-f)  demonstrate  that  the  image  of  the  running  man  can  be  extracted 
correctly  as  foreground. 

Table  5  summarizes  synthesis  results  obtained  when  deriving  the  FPGA  imple¬ 
mentation.  The  maximum  frequency  is  84.062MHz.  By  the  maximum  frequency, 
we  mean  the  clock  frequency  that  the  FPGA  logic  can  execute  at  without  violat- 
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(d)  (e)  (f) 

Figure  19:  Selected  frames  from  a  video  sequence. 


ing  the  timing  constraints.  The  video  frame  rate  is  set  in  our  experiments  to  30 
frames/second.  The  utilization  of  block  RAMS  (BRAMs)  is  high  since  they  are 
used  to  store  the  video  frames,  whereas  the  utilization  of  FPGA  slices  is  relatively 
low  because  the  running  average  algorithm  does  not  require  complex  computation. 


Table  5:  Device  utilization  summary. 


Selected  Device:  3s500efg320-4 

Number  of  slices: 

133 

out  of 

4656 

2% 

Number  of  Slice  Flip  Flops: 

90 

out  of 

9312 

0% 

Number  of  4  input  LUTs: 

245 

out  of 

9312 

2% 

Number  of  lOs: 

16 

Number  of  bonded  lOBs: 

11 

out  of 

232 

4% 

Number  of  BRAMs: 

16 

out  of 

20 

80% 

Number  of  MULT18X18SIOs: 

1 

out  of 

20 

5% 

Number  of  GCLKs: 

1 

out  of 

24 

4% 

In  summary  our  experiments  involving  foreground/background  extraction  on 
an  FPGA  demonstrate  the  correctness  and  completeness  of  our  proposed  PSDF- 
based  approach  for  FPGA  mapping  on  a  practical  video  processing  system.  Tun¬ 
ing  application  parameters  at  run-time  is  an  important  feature  for  advanced  image 
processing  applications,  which  we  seek  to  support  in  this  work.  However,  conven- 
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Figure  20:  Results  for  Figure  19  after  processing  by  our  FPGA-based  implementation  of  fore¬ 
ground/background  extraction. 

tional  dataflow  approaches,  including  SDF,  do  not  allow  such  run-time  parameter 
tuning.  For  this  reason,  we  have  focused  in  our  experiments  on  the  PSDF  model, 
and  novel  application  of  this  model  to  dynamically  parameterized  image  process¬ 
ing  on  FPGAs.  The  experiments  demonstrate  the  capability  of  the  PSDF  model 
to  express  the  behavior  of  this  application,  and  show  that  the  abstract  properties 
of  PSDF,  which  provide  formal,  model-based  manipulation  of  scheduling  and  dy¬ 
namic  reconfiguration,  can  be  integrated  with  platform-specific  details  required  to 
achieve  a  fully  operational  implementation. 

7.  Conclusions 

In  this  paper,  we  have  motivated  the  use  of  dynamically  reconfigurable  hard¬ 
ware  platforms  for  signal  processing  systems,  and  have  presented  background  on 
hardware  methods  for  dynamic  reconfiguration.  We  have  then  motivated  how  pa¬ 
rameterized  dataflow  techniques  integrated  with  the  synchronous  dataflow  model 
of  computation,  which  results  in  the  parameterized  synchronous  dataflow  (PSDF) 
modeling  approach,  can  be  applied  as  an  abstract  model  for  design  and  implemen¬ 
tation  of  dynamically  reconfigurable  signal  processing  systems. 
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We  have  demonstrated  a  PSDF-based  design  methodology  and  associated  sim¬ 
ulation  tool,  called  PSDFsim,  for  design  and  implementation  of  signal  processing 
systems  on  dynamically  reconfigurable  platforms.  We  have  also  demonstrated  the 
use  of  dataflow  schedule  graphs  as  a  formal  model  for  representing  and  manip¬ 
ulating  hardware  mappings  of  PSDF  graphs  throughout  the  design  process.  We 
have  discussed  the  use  of  these  methods  to  help  streamline  the  processes  of  rapid 
prototyping,  heterogeneous  system  design,  hardware  mapping,  and  implementa¬ 
tion.  Our  experiments  show  improvements  in  simulation  efficiency  and  in  the 
quality  of  synthesized  solutions.  Furthermore,  in  contrast  to  ad-hoc  techniques 
for  applying  dynamic  parameter  control  to  SDF  graphs  or  other  kinds  of  design 
subsystems,  the  PSDF-based  approach  that  we  have  presented  provides  for  well- 
structured  integration  of  parameter  management  into  the  SDF  framework.  This 
leads  to  more  efficient  and  reliable  techniques  for  application  of  dynamically  re- 
configurable  platforms. 

Important  directions  for  further  work  include  exploration  of  hardware  map¬ 
ping  techniques  for  more  general  forms  of  parameterized  dataflow,  such  as  pa¬ 
rameterized  cyclo-static  dataflow  and  parameterized  fractional  rate  dataflow  [40, 
9,  45],  and  techniques  for  mapping  parameterized  dataflow  graphs  into  platform 
FPGAs  considering  more  thoroughly  the  available  sets  of  heterogeneous  resource 
groups  (e.g.,  hard  and  soft  core  processors  and  application-specific  accelerators). 
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